System and method for inputting text

ABSTRACT

A system comprising a feature identification means configured to generate one or more features from a plurality of samples, wherein each of the plurality of samples are sampled at a different time and correspond to a location of a single continuous gesture on a gesture-sensitive keyboard as the gesture is being performed and, wherein each of the one or more features relates to one of a plurality of targets of the gesture sensitive keyboard that a user may have intended to input when performing the gesture. The system comprises a prediction means configured to predict one or more terms from the one or more features, the prediction means comprising a prefix tree generating means configured to generate a prefix tree of terms which includes the one or more features, a path finding means configured to find one or more paths through the prefix tree of terms given the one or more features, and a predictor. A corresponding method is also provided.

CROSS REFERENCE TO RELATED APPLICATIONS

This application a continuation of U.S. patent application Ser. No.14/372,445 filed Jul. 15, 2014, which is the National State ofInternational Application No. PCT/GB2012/052981 filed Nov. 30, 2012,which claims the benefit of Great Britain Patent Application No.1200643.3, filed Jan. 16, 2012, the disclosures of which areincorporated herein by reference in their entireties.

FIELD OF THE INVENTION

The present invention relates to a system and method for predicting oneor more terms from a user gesture across a gesture-sensitive keyboard.In particular, the one or more terms may be predicted from a singlecontinuous gesture by a user.

BACKGROUND

There are existing systems which predict single words on the basis of auser “stroke” across a touch-sensitive keyboard/screen.

One such system, as disclosed in U.S. Pat. No. 7,098,896, titled “Systemand Method for Continuous Stroke Word-based text input”, comprises astore of words in a database. The method of predicting a word from theinput stroke pattern comprises comparing the input stroke pattern with aset of the words in the database. The method comprises identifying a“final touch point” in the stroke pattern in order to identify theterminating character for a candidate word. Predictions for candidatewords are then based on the first and last characters of the word, wherethese characters have been identified from the input stroke pattern. Theactual path length of the inputted stroke pattern is compared to anexpected path length which is stored with each word in the database ofwords.

An alternative approach to predicting a word from a continuous strokehas been disclosed in U.S. Pat. No. 7,251,367, titled “System and Methodfor Recognizing Word Patterns Based on a Virtual Keyboard Layout”. Inthis disclosure, stroke input patterns are compared, again within asingle-word boundary, to a pre-determined library of stroke patterns.This technique has been demonstrated both with a completely pre-definedlibrary and with a dynamically generated library.

All of the known systems, including those discussed above, are basedaround the principle of solving the problem of matching an input stroke,which is intended to represent a word, with a word in a database, basedon the input stroke pattern and defined start and end points thatcorrespond (approximately) to the start and end characters of individualwords within the database.

The problem with the known systems and methods is that the input strokesare restricted to correspondence with the start and end characters ofwords in the database, requiring and restricting the user to enterstrokes corresponding to full single words. Thus, the known systems andmethods cannot predict words based on an input stroke which correspondsto the prefix of a word or an input stroke which corresponds to multiplewords, for example where the user is trying to input a phrase with asingle continuous stroke. It is an objective of the present invention tosolve the above mentioned problems.

SUMMARY OF THE INVENTION

In a first aspect of the invention there is provided a method forpredicting one or more terms from a single continuous gesture across agesture-sensitive keyboard. The method comprises:

sampling at a plurality of times the location of the gesture on thegesture-sensitive keyboard as the gesture is performed; and predictingone or more terms from the plurality of samples by: generating one ormore features from the plurality of samples, wherein each of the one ormore features relates to a target on the gesture-sensitive keyboard thata user may have intended to input when performing the gesture;generating a prefix tree of terms which includes the one or morefeatures; finding one or more paths through the prefix tree of termsgiven the one or more features.

Preferably, the prefix tree of terms is represented by a graph, andincluding generating the graph using graph theory.

Generating one or more features from the plurality of samples preferablycomprises identifying a location of the gesture on the gesture-sensitivekeyboard where the user may have intended to pass through a target ofthe gesture-sensitive keyboard. The location of the feature ispreferably the location of the gesture where the gesture passes closestto the target. The target may be a point target or a line target.Preferably, a feature is identified for each target on thegesture-sensitive keyboard. Preferably, a feature is only retained ifthe minimum distance between the feature and the target is below athreshold distance.

Each feature may comprise a distance metric which corresponds to theminimum distance between the gesture and the target.

The prefix tree of terms is preferably generated by retaining the termsof a dictionary prefix tree which are allowed given the one or morefeatures. A term of the dictionary prefix tree may be retained even if afeature does not correspond to that term.

The targets of the gesture-sensitive keyboard may correspond to theletters of the alphabet, and optionally word boundary delimiters, suchas a space, and/or punctuation symbols. The prefix tree of terms maycomprise one or more nodes representing the last letter of a completedword , wherein generating the prefix tree of terms may further compriseinserting a node corresponding to a space character into the prefix treewhere there is a node corresponding to the last letter in a word. Theprobability associated with the node corresponding to the spacecharacter is preferably reduced if a feature associated with that spacecharacter has not been identified. Generating the prefix tree of termsmay further comprise generating at the node corresponding to the spacecharacter, a new prefix tree of terms generated by retaining the termsof a dictionary prefix tree which are allowed given the remainingfeatures in a sequence of one or more features.

The method preferably comprises pruning the prefix tree of terms toremove all paths through the prefix tree of terms for which the ratio ofthe probability of a given path over the probability of the most likelypath is below a predetermined threshold. The node representing a spacecharacter may comprise meta-data to prune the new prefix tree of termson the basis of context data.

Generating the prefix tree of terms may further comprise allowing agiven feature to represent a repeated instance of the character itrelates to, by retaining the terms of a dictionary prefix tree whichinclude character repetition if there is a valid path given the one ormore features through the prefix tree for this repeated character.

In one embodiment, finding one or more paths through the prefix tree ofterms comprises using a path-finding algorithm. The path-findingalgorithm may use the distance metrics to generate a probabilityestimate associated with each path through the prefix tree of terms. Thepath-finding algorithm is preferably configured to return as the one ormore terms, terms for which the corresponding route has a probabilityestimate above a threshold value.

In an alternative embodiment, finding one or more paths through theprefix tree of terms comprises: identifying one or more features whichcorrespond to the end location of the gesture: and assigning anindication of the cumulative probability for a given path to any noderepresenting the one or more features that correspond to the location ofthe end of the gesture, only if that node corresponds to a leaf of theprefix tree of terms. The one or more paths may be ordered by theircumulative probabilities and the path(s) for which the cumulativeprobability is above a threshold value are returned, wherein thereturned one or more paths correspond to the one or more terms.

In either embodiment of the path finding means, the one or more terms ispredicted on the basis of all the currently available samples. Themethod may comprise periodically updating the prediction of the one ormore terms as the single continuous stroke is being performed and moresamples are generated.

Predicting the one or more terms may comprise predicting one or morewords. The one or more words may be predicted on the basis of a singlecontinuous gesture which corresponds to the user gesturing over one ormore characters on a gesture-sensitive keyboard intended to indicate theprefix for that word. Predicting the one or more words may comprisepredicting a phrase comprising a sequence of two or more words on thebasis of a single continuous gesture which corresponds to the usergesturing over characters for multiple words on a gesture-sensitivekeyboard. Preferably, the method comprises using context information totailor the prediction of one or more terms.

Preferably, sampling is performed at a predetermined frequency. Thesampling frequency may be about 60 Hz.

The prediction of the one or more terms is preferably based on thetopography of the gesture-sensitive keyboard in combination with gesturevelocity and/or gesture curve direction. The probability of a paththrough the prefix tree may be dependent on the topography of thegesture between two features and the targets of the keyboard associatedwith those two features. The probability of the path may be based on amonotonically decreasing function of the difference between astraight-line distance between the two targets and the curved length ofthe gesture between the two targets. The probability of the path may bebased on a monotonically decreasing function of the difference betweenthe direction of a straight-line between the two targets and the gesturedirection at each point between the two targets.

The gesture may be a stroke and the gesture-sensitive keyboard is atouch-sensitive keyboard. The method may comprise detecting pressurefrom a user stroking the keyboard to form the single continuous strokeand the step of sampling comprises sampling the location at whichpressure is present. The sampling may comprise detecting a pressurevalue and a location at a given time.

In a second aspect of the invention, there is provided a computerprogram product comprising a computer readable medium having storedthereon computer program means for causing a processor to carry out amethod as described above.

In a third aspect of the invention, there is provided a system. Thesystem comprises a feature identification means configured to generateone or more features from a plurality of samples, wherein each of theplurality of samples is sampled at a different time and corresponds to alocation of a single continuous gesture on a gesture-sensitive keyboardas the gesture is being performed and, wherein each of the one or morefeatures relates to one of a plurality of targets of the gesturesensitive keyboard that a user may have intended to input whenperforming the gesture; and a prediction means configured to predict oneor more terms from the one or more features, the prediction meanscomprising: a prefix tree generating means configured to generate aprefix tree of terms which includes the one or more features; a pathfinding means configured to find one or more paths through the prefixtree of terms given the one or more features; and a predictor.

The system further comprises a gesture-sensitive keyboard comprising aplurality of targets and configured to receive a single continuousgesture as input.

The system may further comprise a sampling means for sampling at aplurality of times the location of the gesture on the gesture-sensitivekeyboard as the gesture is performed.

The prefix tree generating means may be configured to generate a graphby graph theory, wherein the graph represents the prefix tree of terms.

The feature identification means is preferably configured to generateone or more features from the plurality of samples by identifying alocation of the gesture on the gesture-sensitive keyboard where the usermay have intended to pass through a target of the gesture-sensitivekeyboard. The location of the feature is preferably the location of thegesture where the gesture passes closest to the target. The target maybe a point target or a line target. The feature identification means ispreferably configured to identify a feature for each of the plurality oftargets and may be configured to retain a feature only if the minimumdistance between the feature and the target is below a thresholddistance.

Preferably, each feature comprises a distance metric which correspondsto the minimum distance between the gesture and the target.

The prefix tree generating means is preferably configured to generatethe prefix tree of terms by retaining the terms of a dictionary prefixtree which are allowed given the one or more features. A term of thedictionary prefix tree may be retained even if a feature does notcorrespond to that term.

The plurality of targets may correspond to the letters of the alphabet,and optionally word boundary delimiters, such as a space, and/orpunctuation symbols. The prefix tree of terms may comprise one or morenodes representing the last letter of a completed word, and the prefixtree generating means is configured to insert a node corresponding to aspace character into the prefix tree where there is a node correspondingto the last letter in a word. Preferably, the prefix tree generatingmeans is configured to reduce the probability associated with the nodecorresponding to the space character, if the feature identificationmeans has not identified a feature associated with that space character.The prefix tree generating means is preferably configured to generate atthe node corresponding to the space character, a new prefix tree ofterms generated by retaining the terms of a dictionary prefix tree whichare allowed given the remaining features in a sequence of one or morefeatures.

The prefix tree generating means may be configured to prune the prefixtree of terms to remove all paths through the graph for which theprobability of the path is below a predetermined threshold. The prefixtree generating means is preferably configured to associate meta-datawith the node representing a space character to prune the new prefixtree of terms on the basis of context data.

The prefix tree generating means may be configured to allow a givenfeature to represent a repeated instance of the character is relates to,by retaining the terms of a dictionary prefix tree which includecharacter repetition if there is a valid path given the one or morefeatures through the prefix tree for this repeated character.

In one embodiment, the path finding means is a path-finding algorithm.The path-finding algorithm may be configured to use the distance metricsto generate a probability estimate associated with each path through theprefix tree of terms. The path-finding algorithm is preferablyconfigured to return as the one or more terms, terms for which thecorresponding route has a probability estimate above a threshold value.

In an alternative embodiment, the feature identification means isconfigured to determine one or more features which correspond to the endlocation of the gesture, and the prefix tree generating means isconfigured to assign an indication of the cumulative probability for agiven path to any node representing the one or more features thatcorrespond to the location of the end of the gesture, only if that nodecorresponds to a leaf of the prefix tree of terms. The path findingmeans is preferably configured to order the cumulative probabilities andto return as the one or more terms, terms for which the correspondingroute has a cumulative probability estimate above a threshold value.

The predictor is preferably configured to predict the one or more termson the basis of all the currently available samples and may beconfigured to periodically update the prediction for the one or moreterms as the single continuous stroke is being performed and thesampling means generates more samples.

The predictor is preferably configured to predict one or more words. Thepredictor may be configured to predict the one or more words on thebasis of a single continuous gesture which corresponds to the usergesturing over one or more characters on a gesture-sensitive keyboardintended to indicate the prefix for that word. The predictor may beconfigured to predict a phrase comprising a sequence of two or morewords on the basis of a single continuous gesture which corresponds tothe user gesturing over characters for multiple words on agesture-sensitive keyboard. The predictor is preferably configured touse context information to tailor the prediction of one or more terms.

The sampling means is preferably configured to sample at a predeterminedfrequency. The sampling frequency may be about 60 Hz.

The predictor is preferably configured to predict the one or more termson the basis of the topography of the gesture-sensitive keyboard incombination with gesture velocity and/or gesture curve direction. Thepredictor is preferably configured to predict a path through the prefixtree dependent on the topography of the gesture between two features andthe targets of the keyboard associated with those two features. Theprobability of the path may be based on a monotonically decreasingfunction of the difference between a straight-line distance between thetwo targets and the curved length of the gesture between the twotargets. The probability of the path may be based on a monotonicallydecreasing function of the difference between the direction of astraight-line between the two targets and the gesture direction at eachpoint between the two targets.

The gesture-sensitive keyboard is preferably a touch-sensitive keyboardand the single continuous gesture is a stroke across the touch-sensitivekeyboard. The touch-sensitive keyboard may be configured to detectpressure from a user stroking the touch-sensitive keyboard and thesampling means is configured to sample the location of the stroke atwhich pressure is present. The sampling means may be configured todetect a pressure value and a location at a given time.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will now be described in detail with reference tothe accompanying drawings, in which:

FIG. 1 is an illustration of the use of the method and system of thepresent invention and, in particular, illustrates a single continuoususer gesture across a user interface comprising a gesture-sensitivekeyboard to input a phrase and the corresponding predicted phrase whichis displayed on a display of the user interface;

FIG. 2 is an illustration of the use of the method and system of thepresent invention and, in particular, illustrates a single continuoususer gesture across a user interface comprising a gesture-sensitivekeyboard to input a prefix to a word and the corresponding predictedword which is displayed on a display of the user interface;

FIG. 3 is a flow diagram to illustrate the continuous processing of aninput stream in accordance with the present invention;

FIG. 4 is a flow diagram to illustrate the sub-process which is followedto convert input samples to update predictions in accordance with thepresent invention;

FIG. 5 illustrates a view of a feature generating algorithm withthreshold and hysteresis in accordance with the present invention;

FIGS. 6a and 6b illustrate the generation of a prefix tree of terms(FIG. 6a ) from a reference prefix tree (FIG. 6b ) representing adictionary in accordance with the present invention;

FIG. 7 illustrates terminal word endings in a prefix tree of terms inaccordance with the present invention;

FIG. 8 illustrates how a prefix generating means of the method andsystem of the present invention deals with terminal word endings whengenerating a prefix tree of terms;

FIG. 9 illustrates the implementation of a cost function based ondistance and length variables for a trace between two points in agesture, in accordance with the present invention;

FIG. 10 illustrates the implementation of a cost function based on thearea available for the trace between two points, in accordance with theinvention;

FIG. 11 illustrates the implementation of a cost function based onend-point directions for a trace between two-points given twoneighbours, in accordance with the invention;

FIG. 12 is a schematic drawing of a system according to the presentinvention;

FIG. 13 illustrates an example use of the method and system of theinvention and, in particular, a feature graph for the first few samplesfrom an example single continuous gesture;

FIG. 14 illustrates an example use of the system of the invention and,in particular, a gesture across a gesture-sensitive keyboard for a usergesture intending to indicate the terms “a wet”;

FIG. 15 illustrates a partial feature graph generated for the singlecontinuous gesture as illustrated in FIG. 14.

DETAILED DESCRIPTION OF THE INVENTION

The disclosed system and method allows a user to enter arbitrary lengthsequences of character-based input, potentially consisting of multiplewords or a prefix for a word, with a single continuous stroke, whilemaintaining regular feedback in the form of a prefix, word and/or phrasepredictions.

Thus, the system and method of the present invention provides text entryfor a gesture-sensitive device where users can perform a singlecontinuous gesture to indicate intended word or phrase input. For theexamples that follow, there is described a touchscreen ortouch-sensitive device. A touchscreen device of the described embodimentallows a user to enter words or phrases without requiring the user tobreak contact with the screen between letters or words, by moving theirfinger to select characters in sequence while the system simultaneouslyprocesses the input and produces predictions.

The present invention offers a significant improvement over knowndevices because it provides more accurate and flexible text entry withan increased speed in which text can be inputted.

A user input gesture, which for the purposes of the described embodimentof a touchscreen device, will be referred to as a “stroke”, and isterminated by the user lifting their finger off the touchscreen. Thesystem interprets the user breaking contact with the screen as anindication that there are no more words from this point, in thisparticular stroke.

The present system and method makes term predictions on the basis of acontinuous stream of samples, each sample representing the location ofthe gesture in time, whilst the gesture is being performed. By samplingin this way, the system and method is able to function at a higher levelof functionality than word-matching systems. The system and method ofthe present invention is thus able to provide scalability in bothdirections, i.e. to less than one word (a prefix) or to several words ata time, thereby solving problems with the known systems and methods andproviding a more flexible alternative, as will be discussed in relationto FIGS. 1 and 2.

FIG. 1 shows an example of a user gesture, intended to indicate multiplewords, in this case “How are you”, traced across a touch-sensitivekeyboard. As illustrated, there is no requirement for the user to lifttheir finger from the screen between words. The probabilistic approachthis system takes (as will be described below) means that for naturallanguage, it is possible to infer the presence of spaces for likelycandidate phrases.

FIG. 2, shows an example of a user's gesture to indicate a prefix for aword, in this example the prefix being “pl” for the word “please”. Inthis example, the most probable word for the beginning of a sentence(i.e. a prediction where there are no words for context) given an inputgesture passing through “p” and “1” is the word “please”, so this wordis displayed by the system in a display pane of a user interfacecomprising the touchscreen keyboard. The use of contextual evidenceprovided by a probabilistic context model, as will be discussed below,allows the system and method to take into account context informationfor smaller strokes, such as a prefix of a desired word, which willimprove the accuracy of the text predictions provided to the user.

The examples illustrated in FIGS. 1 and 2 relate to inputting text usinga virtual keyboard which does not include a spacebar, however, theinvention is equally applicable to text inputted by a gesture across akeyboard including a spacebar. The system and method of the presentinvention enables a user to input multiple words via a single continuousgesture. Owing to the way in which the text predictions are generated,it is possible for multiple words or a phrase to be inferred from asingle continuous gesture, even in the case where the user does notindicate word boundaries.

With reference to the remaining figures, the specifics of how theinvention is realised will be described below in accordance with anexample method and system.

As stated above, the method and system of the present invention samplesthe location of the gesture with time as the gesture is being performed.For a touchscreen device, pressure values can also be sampled at thesame time as location to provide further information about theword/phrase the user's stroke is intended to represent. The sampling oflocation and/or pressure values may be carried out in any way known inthe art. The system and method models this continuous stream of userinput samples to provide term/word/phrase predictions. A stroke inputcan be defined as a (potentially infinite) sequence of samples, s,where:

s={x,y,p,t}

The values x and y are merely coordinate values representing thephysical location of the user's finger at the time of sampling, p is apressure reading and t is the time at which the sample was taken. Thus astroke, S, is given as:

s={s₁, s₂, . . . , s_(∞)}

This provides a definition of the core input requirement for the system.The system functions to convert the input stroke into an estimate of theprobability of the sample sequence representing a particular charactersequence C at the point in time when a prediction is requested:

p({s₁, . . . , s_(n)}|C)

In this probability estimate, n is defined as the index of the samplewhere t represents the most recent sample. C is a prediction of what theuser intended to enter via the stroke and comprises a set of characterswhich may include spaces. This estimate can be used as an extra evidencesource to any other system or method which makes term/word/phrasepredictions on the basis of multiple evidence sources, for example thesystem and method as described in international patent application no.PCT/GB2011/001419, entitled “Text prediction engine, system and methodfor inputting text into electronic devices”, the content of which isincorporated herein by reference in its entirety.

The system of the present invention is preferably configured tocontinually provide this probability estimate, and may provide thisestimate as input to the text prediction mechanism in internationalpatent application no. PCT/GB2011/001419 which takes the estimate intoaccount with other evidence sources (such as context) to generate moreaccurate text predictions.

As stated previously, the continuous stream of samples of the form {x,y, p, t} needs to be processed to generate an estimate of theprobability of the sample sequence representing a particular charactersequence. This processing step is illustrated in FIG. 3. As shown,whilst the user is touching the screen and performing the stroke,samples are taken. The system continuously processes the sequence ofsamples generated from the stroke. Thus, whilst the user remains incontact with the screen, samples continue to be taken and these samplesare processed along with all of the preceding samples of that gesture,to provide continually updated predictions with associated probabilityestimates. Once the user breaks contact with the screen, it is assumedthat the stroke is finished and that there are no more characters/wordsfrom this point, for this particular stroke.

A sub-process is required to convert the sequence of samples of the form{x, y, p, t} into one or more candidate character sequences withassociated probability estimates. FIG. 4 illustrates the conversion of‘raw’ input samples into probability estimates/updates. As illustrated,a three-step process is provided to convert the raw samples intocandidate character sequences with associated probability estimates.

As shown in FIG. 4, the first step of the sub-process is to convert theraw samples into a sparser sequence of more abstract “features”, whereeach feature may represent a candidate character for inclusion in thepredictions (i.e. the candidate character sequence) along with distancemetrics to be used for calculating probability estimates. The conversionof the raw samples into a sequence of features will be explained below.

Once the raw samples have been converted into features, the second stepof the sub-process comprises generating a prefix tree of terms. Theprefix tree of terms is generated from a reference dictionary prefixtree by retaining paths of the reference prefix tree which are validgiven potential sub-sequences of features. The reference prefix tree isa complete prefix tree for an entire dictionary and is therefore astatic read-only data-structure. The reference prefix tree can be storedin a memory of the system. In contrast, the prefix tree of terms is adynamically generated prefix tree which is a sparse copy of thereference prefix tree incorporating only elements that are relevant tothe stroke pattern. The prefix tree of terms is usually discarded aftergenerating a prediction for a given stroke pattern. However, in somecases, it may be preferable to retain the prefix tree of terms toprovide context evidence for the next predictions predicted from aconsecutive stroke pattern, e.g. in a situation where there are twostroke patterns representing consecutive terms in a sentence/phrase.

The reference dictionary prefix tree can be any reference dictionaryprefix tree known in the art, for example any of the references cited onhttp://en.wikipedia.org/wiki/Trie or a model of the English languagebased on English language text as discussed in International PatentApplication Publication No. WO 2010/112841, entitled “System and methodfor inputting text into electronic devices”, the content of which ishereby incorporated by reference in its entirety. In a preferredembodiment, the prefix tree of terms is represented by a graph andgenerating a prefix tree of terms comprises constructing a weighted,directional graph of the potential sub-sequence of features.

Given a reference dictionary prefix tree for relevant languages, aweighted, directional graph is constructed as follows:

-   -   Define a Node, N in a graph to be a pair, {f, t}, where f is the        feature this node represents and t is the node in the reference        dictionary prefix tree it corresponds to;    -   Define an Edge, E, to represent both the existence of a valid        prefix tree connection from the character represented by a        feature to some later feature and the cost associated with that        path (cost functions are discussed below); and    -   Build a graph as a set of nodes and edges from the features.

Preferably, the construction of the graph further comprises inserting anode into the graph if a node t is present in the reference dictionaryprefix tree, even if a feature has not been identified by the featureidentification means for this character. By inserting a node into thegraph when such a node exists in the reference dictionary prefix tree,the system and method is able to increase the accuracy in correctingmisspelt or mistyped words. The insertion of nodes into a graph, whenthere is no identified feature corresponding to the character associatedwith that node will be explained in greater detail below, along with adiscussion of the cost of the graph edge, i.e. the cost that is assignedto the inserted node.

Although estimates of the probability for each candidate word or phraseare required, it is more natural to work with the concept of a “cost”when dealing with graph edges. Therefore, in the description thatfollows the probability associated with a given path is often stated interms of the cost of that path. The probability, p(C), of a candidate,C, can be derived based on its cumulative cost, c, from root to terminalnode in the graph as follows p(C)=e^(−c) or, reflexively, the cost canbe obtained from a probability estimate with c=−ln(p(C)).

Thus the system and method of the present invention, at the second andthird steps of the sub-process, has a first reference dictionary prefixtree (which is stored in a memory of the device) and a second,dynamically generated, prefix tree (which we have referred to as theprefix tree of terms or the graph). The use of a static reference prefixtree and a dynamically generated prefix tree enables efficientcontinuous prediction by providing a straight-forward resume operationwhen one gesture is a prefix of another. It also allows for the use ofpath-finding algorithms to efficiently explore the tree (which will beapparent from the description of step three in the sub-process whichfollows). The dynamically generated tree is used for the lifetime of asingle gesture (or for the lifetime of the next gesture if the firstgesture is a prefix of the next gesture), then discarded whereas thestatic prefix tree is retained unmodified. The terms ‘prefix tree ofterms’ and ‘graph’ are used interchangeably, since the prefix tree ofterms has cost values associated with the connections between the nodes.

Returning to FIG. 4, the final step in the conversion of the raw inputsamples into predictions with associated probability estimates is tolocate valid paths through the prefix tree or graph. Paths through theprefix tree of terms represent possible predictions of the candidatecharacter sequence. The cheapest or most probable paths through theprefix tree or graph are returned as the updated predictions in the flowchart of FIG. 3. In one embodiment, a path-finding algorithm can be usedto locate the cheapest (or most probable) paths through the prefix treeor graph. Any path finding algorithm known in the art can be used, forexample Dijkstra or A*. However, alternative approaches to identifyingthe most probable paths can be employed, as will be discussed later. Theprobability estimate for a given prediction is derived from thecumulative cost through the features of the graph or prefix tree forthat given path. The implementation of a cost function, to provide acost to each leaf of the prefix tree, is discussed later.

The predictions and their associated probability estimates can beprovided to the system described in international patent application no.PCT/GB2011/001419 as an evidence source for the overall predictionprobability.

The preferred method, which represents the prefix tree by a graph, makesuse of graph theory techniques to manage the combinatorial complexitycaused by continuous sampling. The sequence of features generated fromthe raw input samples represent far more characters than the intendeduser input. By finding paths through the graph, possible sub-sequencesof the full feature sequence are selected. Furthermore, by generating agraph/prefix tree which is a sub-graph of the reference dictionaryprefix tree, the path finding algorithm is made more efficient atsearching for valid paths. An additional benefit derived from generatinga graph/prefix tree from the reference dictionary prefix tree is that itis possible to efficiently and accurately correct misspelt or mistypedwords, which can, in part, be achieved by retaining certain letters ofthe reference dictionary that have not been identified as features.

If the sequence of features is modelled as an ordered set, F, all of thepossible sub-sequences are the power set of F. That is, a set containingall of the possible subsets of F.

F={f₁,f₂. . . f_(n)}

P(F)={{f ₁ },{f ₁ ,f ₂}, . . . },

If n is the number of features derived from the sample input so far, thegrowth of P(F) is given as:

|P(F)|=2^(n)

A naive algorithm considering all possible routes implied by some userinput will be of super polynomial time complexity. The present method ofgraphing the combinations with some dynamic pruning and routeprioritisation can help this problem. Such an approach will provide ahigh-level strategy for dealing with the search space complexity.

As stated previously, the problem of estimating predictions on the basisof raw input samples can be broken down into configurable sub-problemscomprising identifying a sequence of features from the sequence of rawsamples, generating a graph on the basis of the features, finding pathsthrough the graph. Furthermore, probability estimates can be associatedwith the paths based on the cumulative cost of the path through thegraph.

To reduce computational complexity, a set of paths of the graph can bepruned at a given time, based on their cumulative cost up to some point.For example, a path may be removed (pruned) where the updating of theprobability estimate for that path decreases such that the ratio of theprobability estimate for that path to the probability estimate for themost likely path (for all the paths) falls below a threshold value.

Returning to the steps of the sub-process as described in FIG. 4, thesystem requires a feature identifying means, a graph (or prefix tree ofterms) generating means and a path finding means. In the present case,the system and method use an algorithm comprising a featureidentification algorithm, a graph generating algorithm, and a pathfinding algorithm. The graph generating algorithm can be broken downinto two different phases, where either part can be replaced to achievedifferent performance or quality characteristics. The first part is a“cost function” which assigns weights between possible connections inthe graph/prefix tree of terms. The second part is an (optional) pruningfunction which can be used to dynamically discard routes through thegraph that appear too expensive early on. The pruning function providesthe advantage of improving performance of the graph constructionalgorithm (potentially at the cost of accuracy).

If the “cost function” is configured to assign the cumulative cost toeach of the nodes in the prefix tree of terms, then a path findingalgorithm configured to find the cheapest route is not needed, becausethe cheapest path can be determined by the cumulative cost valuesassigned to the leaves (as will be explained in more detail later).Thus, in this case, the algorithm need only comprise a featureidentification algorithm and a graph generating algorithm. The predictoris configured to sort the paths by their cumulative cost to the targetnode and output the sequences the paths represent with an associatedprobability as predictions.

The individual steps in the sub-process will now be described in moredetail. Returning to the first step, a feature identification algorithmis configured to generate features from the raw input samples. A featureis defined to represent a point on the stroke where the user may haveintended to pass through one of the fixed targets (e.g. target points ortarget lines) on the keyboard. The feature identification algorithm needonly generate a number of such optional features from the input samples(since the graph generating algorithm calculates the costs of includingor skipping a feature).

A feature can be described by the following set of values:

f_(i)={t_(i),l_(i),d_(i),s_(i)}

where t_(i) is the intended target, l_(t) is the distance along thestroke to the feature, d_(i) is the distance between the feature (on thestroke) and the location of t_(i), and s_(i) is the index of the closestsample to the feature. Features described in this way can only everapply to a single ‘target’ on the keyboard (targets may have anapproximately one-to-one mapping to characters). For example, a featurefor the ‘z’ target will never result in an ‘x’ being added to the input.For this reason, targets cannot be computed from stroke topologyalone—they are defined based on the topology and keyboard layout.

The feature identification algorithm adds features independently foreach target on the keyboard, adding a feature at the local closestapproach between the stroke and the target. A local closest approach isa local minimum in the separation between stroke (curve) and target(e.g. point or line). In the case where the target is a point (e.g.representing a character key on a virtual keyboard) the separation canbe computed directly. However, other types of target can be used, forexample the target may be represented by a line, which could be straightor curved (e.g. in the case of a space bar on a virtual keyboard). Inthe case of a line-target, the feature identification algorithm computesthe point along the line that represents the location at which thedistance between the stroke curve and the target line is at a minimum.This point is then used to compute the separation, as for a standardpoint target. If the stroke curve crosses the target line, then theseparation between the stroke curve and the target line will be zero,and the feature will be added at the point where the curve crosses theline. As shown in FIG. 5, the granularity of the local minimum detectionis defined by the hysteresis of the feature detector, since there islatency between the sampling of the stroke and the identification of afeature.

The stroke is unlikely to be smooth, because the user is most likelyhuman (however, it is possible to use a smoothing algorithm if desired).The distance of the unsmoothed stroke from the middle of a key willfluctuate as the user passes it, potentially creating many local minima.It is preferred to configure the feature identification algorithm toidentify features at points on the curve where there is some localminimum that is “significant”. The hysteresis constant provides a meansof defining a value by which the distance from the target needs to havefluctuated in order for the minimum to be significant and a feature tohave been identified. The graph of FIG. 4 shows four local minima belowthe threshold distance, where only two of the minima are consideredsignificant.

The advantage of hysteresis is that it is possible to avoid adding toomany features with an unsmoothed stroke. Additionally, features are onlyadded if the separation between feature & target is less than thethreshold of the feature detector (see FIG. 5).

Once the features have been identified, the next step (see FIG. 4) isthe generation of a prefix tree or graph which comprises those features.In the preferred embodiment, the system and method uses a graphgenerating algorithm.

The desired output for a graph generating algorithm is described withreference to FIG. 6a . A reference dictionary prefix tree is illustratedin FIG. 6b . The generated prefix tree of terms (or graph) isillustrated in FIG. 6a . As can be seen, the generated prefix tree/graphcomprises only the relevant routes through the reference dictionaryprefix tree, i.e. only those routes which are valid on the basis of thefeatures.

The graph/prefix tree thus describes the word sets relevant to thefeatures only. As can be seen, the prefix tree/graph is directional,because the features of the sequence remain in order within the graph,however, features from the full sequence of features can be skipped,e.g. for the input features c, d, o, f, g, h, the word ‘dog’ is a validroute, where the features c, d and h have been skipped. As statedpreviously, the features identified by the feature identifying algorithmrepresent more characters that the user will have intended to input, butfeatures can be skipped by the graph generating algorithm.

In addition, the graph generating algorithm is preferably configured toinsert into the graph/prefix tree a node representing a character thatwas not identified as a feature by the feature identification algorithm,if that node is in the dictionary prefix tree and is part of a validpath given the identified features. By way of non-limiting example,assume that the feature identification algorithm identifies the features‘c, d, o, f, g, h, t’ in the example described above (in relation toFIGS. 6a and 6b ). The graph generating algorithm would insert the noderepresenting ‘a’ into the graph/prefix tree with a corresponding costassociated with its insertion, such that ‘cat’ is an additional pathprovided in the graph/prefix tree of FIG. 6a . The way is which costsare assigned to inserted nodes will be described below.

In the prefix tree, valid points at which to end a word, according tothe provided reference dictionary, are represented with a terminal node.The terminal nodes are highlighted with bolder borders in FIGS. 6a and6b to distinguish them over the non-terminal nodes. Note that all edgesin the generated prefix tree/graph will have a cost associated withthem, but for brevity only one route is shown in this example.Additionally, features are represented as merely the characters they aremost likely represent. In practice, a feature would have other dataassociated with it related to the cost function being used.

There are many ways to construct a graph and different approaches willoffer different performance characteristics or compromises inflexibility. The preferred algorithm uses a depth-first recursiveconstruction method, i.e. the graph/prefix tree of terms is constructed(by the graph generating algorithm) in the order that the nodes of thegraph/prefix tree of terms would be visited by a depth-first search. Thealgorithm is configured to incorporate pruning at each recursive step.

Preferably, the prefix tree or graph generating algorithm coversimplicit character repetition by considering any feature to possiblyrepresent repeated instances of the character they represent, if thereis a valid prefix tree route for it. Should this occur, the costfunction will be called with the same feature as both parent and childparameter. The cost function implementation will have a rule forassigning sensible cost values to repeated features.

By way of a non-limiting example, an implementation for features, f, fcan be as follows:

cost=s(f)*c(f′, f)

where s(f) is a function that works with the “separation” distance (thedistance from the feature f on the stroke to the key centroid) and c(f′,f) is a function that works with the “curve distance” delta, which isthe distance between the two features on the curve. c(f′, f)=1 if f andf are the same, so only the separation component is taken into account.

The final step in the sub-process comprises identifying valid paths andthe probabilities associated with those paths. The cumulative costs ateach terminating node will be used to calculate word probability.

In the particular case illustrated in FIGS. 6a , the predictions for thegiven user input would have been “dog” and “of”. These are the only twowords possible given the input sequence provided, and they have a set ofindependently calculated cost values associated to them for use withcalculating their probability later. However, in the alternativedescribed in relation to FIG. 6a , there is the additional prediction of“cat” which has a calculated cost value associated with the prediction.

The system may be configured to call a cost function only for featurepaths that the prefix tree shows are valid.

One approach might be to find least costly paths through the graph witha known path finding algorithm like Dijkstra or A*.

Another, preferred approach, is to record cumulative cost at each graphnode as the graph is being constructed. Given the tree-like structure ofthe graph of the preferred embodiment, there can only be one route toeach “leaf”. That means, if there is a way of identifying sensible“target” nodes, the paths can be sorted by the cumulative cost to thosepoints, and the sequences they represent can be returned veryefficiently, without requiring a path finding algorithm.

Contrary to the known approaches, the present method does not limit thematching of the input stroke to a single word, as the graph structurecreated and the search methods outlined above contain a flexiblerepresentation of all the information. It is clear the same featureidentification and graph generation can be used to predict words fromfeatures representing a prefix of a word only and a phrase from featuresspanning across multiple words.

The graph/prefix tree generation means effectively generates a sub-treeof the reference dictionary prefix tree with weights (or costs) added toit. Thus, it is possible to match prefixes because the route through thegraph with the lowest cost should be the most probable prefix. To biasthe cost calculation for features that cause a word to be completed, theconcept of a “terminal node” can be added to the graph/prefix tree,where a terminal node representing the last character in a word has thecost associated with it weighted by some factor dependent on a spaceprobability function, as will be described in more detail below.Terminal nodes are shown in the FIGS. 6a, 6b and 7 by a node which isshaded with a darker boarder, distinguishing it from non-terminal nodes.

To predict phrases comprising one or more words, word endings areidentified by the graph/prefix tree generating means and the possibilityof a space (or other word boundary delimiter) with an appropriate costis incorporated into the graph, as is explained in greater detail below.This enables the system and method to chain words together, matchingentire phrases if the input suggests this could be the desired output,irrespective of whether the user actually indicates word boundaries viatheir input (e.g. by gesturing across or close to a space bar on avirtual keyboard).

The graph generating algorithm can be configured to add a spacecharacter (or any other word boundary delimiter) to the graph as a childof some node N if:

-   -   N is terminal (a valid end of word)    -   The cost c for a path passing through the terminal node N given        the space probability function meets the pruning criteria being        used. In one example, a path is pruned if the ratio of the        probability of that path over the probability of the most likely        path falls below a threshold.

The cost that is associated with a space incorporated into the graphwill be dependent on whether or not the feature identification meansidentifies a feature corresponding to that space. For example, a usermay traverse the space bar during a stroke on a virtual keyboard toexplicitly signal a word boundary, the feature identification meansidentifying the space as a feature. Alternatively, the user may notgesture across or close to the spacebar, for example because they havemissed the spacebar out (e.g. in error or intentionally) or because thevirtual keyboard omits a target to indicate a term boundary (e.g. akeyboard without a spacebar). In such a case, a space feature is notpresent in the set of features identified by the feature identificationmeans.

FIG. 7 illustrates an example prefix tree illustrating a terminal nodeN. The standard cost associated with the terminal node N is determinedby a cost function, as is described in more detail below. However, thecost that is associated with a path through a terminal node is modifiedby the cost that is associated with the space node that follows.

In the case where a feature corresponding to a word boundary character(e.g. a space) is identified by the feature identification means, andthis space feature corresponds to the space node (i.e. the space featureoccurs at an appropriate place in the directional set of features, giventhe graph), the graph generating means associates a cost with that spacenode using the standard cost function, as described below.

However, if a space feature is not identified by the featureidentification means, a penalty is applied by the graph generating meansto the space node, since the space node was not explicitly identified bythe user (thereby increasing the cost of a path that passes through theterminal node). This penalty is preferably a fixed cost.

In the case of a keyboard with no space bar (e.g. as shown in FIGS. 1and 2), the graph generating means may be configured to assign a smallpenalty, e.g. a low fixed cost, to the space node, because the user isnot able to explicitly identify a space between words. It someembodiments, the penalty for a space node following a terminal nodecould be dropped altogether. It may be advantageous to include a penaltyfor the space node, even though it is not possible to explicitly input aspace, because this may give more accurate predictions, for example inthe case where a user is more likely to input via a single gesture asingle word or a prefix for a single word (rather than multiple words ora phrase).

Thus, a space node that has a higher associated cost will result in apath through the terminal node having a greater cost associated with it,which essentially amounts to increasing the cost associated with theterminal node.

As discussed previously, the graph generating algorithm can beconfigured to insert a node corresponding to a character where thatcharacter has not been identified as a feature, but is present in thereference dictionary tree, e.g. the graph generating algorithm caninsert a node corresponding to a letter not identified by the featureidentification algorithm. The graph generating algorithm assigns apenalty, e.g. a fixed cost, to the inserted node, in the same way thatit assigns a penalty to a space node when that space was not identifiedas a feature. Preferably, the penalty assigned to an inserted node willbe dependent on the type of character corresponding to that insertednode, e.g. the fixed cost assigned to a space node will be different tothe fixed cost assigned to a letter node, which may also be different toa fixed cost assigned to a punctuation node other than a space node.

The determination of the costs of a path in the case where a characterhas been inserted without a corresponding feature being identified willbe discussed below.

FIG. 8. shows how a space character is represented in the resultinggraph. As stated previously, each node in the graph is paired with itscorresponding node in the reference dictionary prefix tree. Nodes thatfollow a “space node” are paired with the root of the reference prefixtree, such that, effectively, a new sub-tree of the reference prefixtree is generated at each of the nodes (that follow the spacecharacter). This will mean that the algorithm can resume the prefixsearch appropriately for a new word. It is also possible to add somemeta-data to the space node, so that contextual evidence can be used toprune the search continuing for the next word. Restarting the search atevery possible space causes a massive increase in search space size. Byusing context information, it is possible to identify which words arelikely to follow on from the space node, given the word or sequence ofwords preceding that space node. The cost value for a subsequent featurecan be penalised if it represents a character for a word which isunlikely to follow the space node given the context information, thusenabling the context information to be used for pruning (since unlikelypaths can be removed from the prefix tree).

Take an example in which a user enters the phrase “hi there” with asingle stroke. The algorithm will have walked as far as “h->i” in theprefix tree and found that “i” is marked as a terminal node. A noderepresenting a space character is added to the tree with an associatedprobability.

The likely words to follow “hi” and their associated probability can beattached to the space node and the cost function can be configured totake this into account when attaching subsequent nodes to that spacenode. An example system and method which uses context information toinform later predictions is disclosed in international patentapplication publication number WO2010/112841 or international patentapplication no. PCT/GB2011/001419, the contents of which areincorporated by reference in their entirety.

Therefore, an unlikely phrase (which is potentially valid for the sameinput pattern, such as “hi threw”) can be assigned a much higher cost. Apruning function will likely eliminate that path early and reducing thepaths in the prefix tree which need to be explored.

The graph construction algorithm outlined previously will consider anyfeature in the sequence of features to have potentially been a validstarting point. That is, any of those features may be considered anindication of what the first character of the first word should be. Toreduce computational complexity, it can be assumed that there is somedegree of accuracy in the user input and, in particular the start of thestroke.

As discussed in the preceding description, computational complexity canbe greatly reduced if candidate roots and “targets” can be identified. Acandidate root represents a feature corresponding to the first letter ofthe first word predicted for a given sequence of features, and thusshould correspond to the target of the keyboard the user intended tostart the gesture on. A target candidate represents a featurecorresponding to the target at the end of the gesture, i.e. the target(e.g. character key) the user intended to touch at the end of thegesture, whereafter the user removes their finger from the screen.

Identification of target candidates is not used to limit the searchspace, but can be used for updating the candidate predictions, as itgives a sensible set of leaves to use for cumulative cost analysis.There are many more leaves in the prefix tree than there are ‘validends’ to the sequence of features, where a ‘valid end’ is a featureidentified as a target candidate. Thus, by identifying target candidatesit is possible to identify valid paths through the prefix tree andreduce greatly the number of nodes to which a cumulative cost isassigned. For example, in FIG. 6a , if the target candidate wasidentified as ‘f’ (i.e. the end feature of the sequence of features is‘f’), then the only valid path is “of” and the cumulative cost need onlybe assigned to the terminal node ‘f’. In this instance, the system wouldpredict “of” and not “dog” (or “cat” for the alternative examplediscussed).

By identifying candidate targets and storing cumulative costs at thesetargets, predictions can be made without the use of a path findingalgorithm: Paths with valid end features are identified and cumulativecosts are assigned to the nodes that represent the valid end features,allowing for determination of the most probable path by ordering thecumulative costs and identifying the path(s) with the lowest cost. Thepredictor is required only to calculate the probabilities for the leastcostly path(s) from the associated cumulative cost(s) and output themost likely paths as predictions.

The determination of a candidate root and a target candidate is nowdescribed. Several “reasonable” candidate roots are identified to playthe role of the location co-ordinates, both x and y for the location ofthe first sample of the stroke S, as it would be needlessly limiting torestrict the search space to a single value for each, unless it ispossible to do so with absolute confidence. Likewise, several“reasonable” target candidates are identified.

The desired starting character for the first word in the sequence beingentered is likely to be close to the key centroid where the input began.To determine the root candidate, it can therefore be assumed that thereis a sample and consequently a feature within some constant factor ofthe maximum key separation distance from the beginning of the inputstroke. Likewise, the desired end character for the last term in thesequence being entered is likely to be close to the key centroid wherethe input finished and the user broke contact with the screen.

In a preferred embodiment, to identify a root candidate the methodcomprises determining a “curve distance” attribute within the features,l. The curve distance is defined by the following expression:

${l\left( f_{n} \right)} = {\sum\limits_{j = 1}^{s{(f_{n})}}{{s_{j} - s_{j - 1}}}}$

Here f_(n) is the nth feature, for which s(f_(n)) is the index of theclosest sample, and ∥s_(j)-s_(j−1)∥ is the Euclidean distance betweentwo adjacent samples. Thus the curve distance, l, of the nth feature isthe sum distance along the sequence of samples, up to that feature.

The implementation assumes a constant for the maximum separationdistance between two keys, k, as well as tuning parameters R and L forcoefficients to be used in identifying root and target candidates fromthe input sequence respectively.

The above function assumes a linear relationship between user error andkey separation. More sophisticated methods could be used, involvingmotion velocity, which can be calculated from time stamps associatedwith each feature, or potentially pressure readings.

The graph generating algorithm that has been discussed models thepossible routes with respect to a prefix tree. This algorithm calls outto a cost function, which is how it attaches weights to the graph toincorporate the physical topology of the input patterns. As statedabove, in addition to the graph generating algorithm calling out to acost function, the graph generating algorithm can assign penalties, i.e.fixed costs, to a node inserted into the graph, where the characterrepresented by that node has not been identified as a feature. Below,possible cost function implementations, based on curve length, averagedistance between user stroke and a reference stroke, stroke (trace)speed, and end-point detection are discussed.

One cost function implementation is based on trace curve length asillustrated in FIG. 9. A simple first-order model for a user's tracebetween two points starts with the assumption that:

the ideal trace between two points goes through both points, and is theshortest possible distance

The error model is such that increased distance from either point yieldsa less likely curve, as does a longer curve. It is obvious that the‘ideal user’, in this scheme, will visit each character in turn, in astraight line from the previous character.

One simple error model in this scheme is the following:

p(s _(i) |f _(i) ,f _(i+1))=p _(d)(d _(i))p _(l)(∥S _(i) ∥−∥x _(i+1) −x_(i)∥)

where S, is part of the stroke (a sequence of samples), f and f_(i+1)are features that delimit the partial stroke, p_(d) is the distanceerror model, p_(i) is the length error model, d_(i) is the distancebetween the target and the stroke, ∥S_(i)∥ is the length of the partialstroke, and ∥x_(i+1)−x_(i)∥ is the straight-line distance between thetwo targets. The meaning of each of the variables is demonstrated inFIG. 9. Both distance and length error models may be Gaussian orexponential distributions, where the maximal probability is at zerodistance or length error as appropriate. The equation above onlyincludes a distance probability estimate for the first point, becausethe end of the current trace will form the start of the next, so thedistance at the end, d_(i+1), should be included in the estimate for thenext pair of character targets.

Another way of encoding and evaluating the same assumption as the ‘curvelength’ model (that a user takes the shortest path between two points)is to measure the average distance between the user's trace and a‘reference trace’, as shown in FIG. 10. In the case of the shortest pathassumption, the ‘reference trace’ is a straight line between the twokeys. In this model, a possible error model is

p(S _(i) |f _(i) ,f _(i+1))=p _(a)(A _(i) /∥x _(i+1) −x _(i)∥)

where A_(i) is the area enclosed by the trace and the straight pathbetween the features, as shown in FIG. 10. An exponential or Gaussiandistribution are possible choices for p_(a). In this model, the area andstraight-line distance are simply being used to calculate the averagedistance of the trace from the ‘best path’. In the same way, the ‘bestpath’ could be any other sort of idealized path, for example a Hermitespline through the target points, or a b-spline approximation to thetarget points.

An alternative implementation for the cost function is based on tracespeed. Most known techniques for sub-trace modelling are based purely ontopology; they do not take into account the fact that the traceadditionally contains timing information, which may be used to derive‘trace velocity’. A complete model for trace input would include thespeed of the trace. A simple model for trace speed might assume that:

the user's trace will have greatest speed between features, and lowestspeed at features

The error model penalizes traces with low speed far from features. Sucha model could accumulate probability along the trace between the twofeatures, as follows:

p(S_(i)f_(i), f_(i + 1)) = p_(d)(d_(i))^(∫_(l_(i))^(l_(i + 1))log  p_(s)(s(a); a) a)

where s(a) is the (possibly smoothed) trace speed as a varies betweenthe starting and finishing curvilinear distance, and p_(s) is low whenthe speed is low and the curvilinear distance between the start and endof the sub-trace is high.

Another implementation of the cost function is based on end-pointdetection, as shown in FIG. 11. Curve direction can be determinedtrivially from the topology, and it is potentially a powerful way toincorporate neighbouring points into the trace model. If the ‘ideal’trace is a Hermite spline going through the four points, the directionof the curve at x_(i) is:

$\hat{t_{\iota}} = \frac{x_{i + 1} - x_{i - 1}}{{t_{x + 1} - x_{i - 1}}}$

the unit vector between the previous and next points. An error modelusing this might take the following form:

p(S _(i) |f _(i) ,f _(i+1))=p _(d)(d _(i))p _(t)(t _(i); {circumflexover (t)}_(i))

where t_(i) is the (possibly smoothed) direction vector of the curve atthe start of the sub-trace. _(Pt) is a probability distribution thatgives high probability to vectors in similar directions, and lowprobability to dissimilar directions. The various vectors and valuesinvolved are shown in FIG. 11.

Another implementation of the cost function may be based on curvedirection. In the same way as average distance and speed errors can besummed up along the curve by integration, it is possible to assume amodel for curve direction, and evaluate direction error over the wholecurve and not just the end-points. A simple direction model could be ofthe form:

p(S_(i)f_(i), f_(i + 1)) = p_(d)(d_(i))^(∫_(l_(i))^(l_(i + 1))log  p_(t)(t_(i)(a); t̂_(i)(a)) a)

which is just a continuous version of the end-point direction equation.A simple choice for the reference direction, {circumflex over (t)}(a),does not depend on a and is simply the direction between the previousand next keys:

${\hat{t}(a)} = \frac{x_{i + 1} - x_{i}}{{t_{x + 1} - x_{i}}}$

Alternatively, {circumflex over (t)}(a) could be a linearly interpolateddirection vector, between the direction vectors at {circumflex over(t)}_(i) and {circumflex over (t)}_(i+1) previously. A choice for p_(t)is made in the same way as the End-point direction model.

The above described cost functions provide a cost for each feature nodeof a valid path through a graph/prefix tree. In the case of an insertednode where the corresponding character has not been identified as afeature, for example a space node, a letter node, a punctuation node, ora node representing the repeated instance of a feature, the costfunction implementation precedes as described above to assign a cost tothe feature nodes, where the inserted node is ignored for the purposesof calculating the costs. In the case of determining a cumulative costof the path comprising an inserted node, this will be the cumulativecost of the feature nodes calculated by the cost function and the costassigned to the inserted node.

For example, where the identified features of the path (“cat”) were “c,t”, the cost function will provide a cost for the feature nodes “c, t”,as if the inserted a-node did not exist. The cumulative path cost willthen include the penalty for inserting the a-node.

For a node inserted to represent a repeated feature, the cost assignedto that node is as described above (cost=s(f)*c(f′, f)).

Thus, as discussed in the above, the present method and system providesa means of predicting words or phrases on the basis of a singlecontinuous gesture which may represent a prefix for a word, a word or aphrase. The system and method achieves a flexibility in user input (e.g.not being limited to the gesture representing a single complete word,the gesture explicitly indicating word boundaries, or the user beingrequired to accurately spell or type the intended word(s)) and textprediction by sampling (location, and optionally pressure, at aplurality of times) the gesture, as it is being performed on agesture-sensitive keyboard.

As described above, to generate the word or phrase predictions from thesamples, the samples are converted to features, the features are used togenerate a prefix tree of terms or a graph representing a prefix tree ofterms, and valid paths are identified through the prefix tree or graph,the valid paths representing prediction candidates. Cost functions areprovided, which enable probability estimates to be associated with thepaths (where the probability estimate can be device from the cost usingP(C)=e^(−c)).

The words represented by the most probable paths can be returned to auser as a set of word or phrase predictions, for example by displayingon a display panel of a user interface. In one embodiment, only the mostprobable word or phrase is displayed to the user. Alternatively, thepredictions and associated probability estimates can be passed toanother system which may use them as an evidence source for a predictorwhich makes predictions on the basis of multiple evidence sources.

An example use of a touchscreen device in accordance with the presentinvention is provided below. As will be apparent from the precedingdescription, the system comprises a gesture-sensitive keyboardcomprising a plurality of targets (e.g. points or lines) which isconfigured to receive a single continuous gesture as input from a userand a sampling means for sampling at a plurality of times the locationof the gesture on the gesture-sensitive keyboard as the gesture isperformed. Furthermore, the system includes a feature identificationmeans configured to generate one or more features from the plurality ofsamples, a prefix tree generating means configured to generate a prefixtree of terms which includes the one or more features, and a pathfinding means configured to find one or more paths through the prefixtree of terms which are valid given the one or more features.

A system 10 in accordance with the present invention is illustrated inFIG. 12. The system 10 comprises a sampling means 1 configured to takesamples 11 of the touch location at a plurality of times as the gestureis being performed. The samples 11 are passed to the featureidentification means 2. The feature identification means 2 is configuredto identify features from the samples 11, as described above in relationto the method.

The features 12 are passed to a graph/prefix tree of terms generatingmeans 3 which is configured to construct a graph/prefix tree of terms 13using the features 12. The graph/prefix tree of terms generating means 3is configured to use a reference dictionary prefix tree 3 c to generatea dynamic graph/sub-prefix tree of terms 13 which comprises paths of thereference dictionary prefix tree 3 c which are valid given the features12 that have been identified. The graph/prefix tree of terms generatingmeans 3 comprises a cost function 3 a which is configured to assigncosts to the nodes in the graph/prefix tree of terms 13. In theembodiment where the system comprises a path finding algorithm, the costfunction 3 a assigns costs to each node in the graph/prefix tree ofterms 13. However, in the alternative embodiment in which the system isconfigured to identify target nodes (as described above), the costfunction 3 a is configured to assign a cumulative cost to the targetnodes.

The graph/prefix tree of terms generating means 3 preferably comprises apruning means 3 b. The pruning means 3 b is configured to remove pathsfrom the graph/prefix tree of terms 13 which have a low probability.Preferably, the pruning means 3 b removes paths for which the ratio ofthe probability of a given path over the probability of the mostprobable path is below a threshold value. The graph/prefix tree of termswith associated costs (and optionally pruning) 13 is passed to a pathfinding means 4.

The path finding means 4 is configured to identify one or more of theleast costly (and thus most probable) paths 14 through the graph/prefixtree of terms 13. The path finding means finds the one or more paths 14by identifying the path(s) with the lowest cumulative cost at targetnodes, e.g. by ordering the paths by their cumulative costs andreturning the one more paths with the lowest cost(s), or by finding theleast costly paths through the graph/prefix tree of terms 13 by using apath finding algorithm. The paths 14 are passed to a predictor 5 whichgenerates predictions 15 on the basis of the paths 14. The predictions15 comprise one or more words or phrases with associated probabilityestimates, where the word or phrases are formed from the features 12which are represented by the nodes of the paths 14 and the probabilityestimates are determined from the cost associated with that path, asdescribed above.

As described above, the predictions 15 represented by the most probablepaths can be returned to a user as a set of word or phrase predictions,for example by displaying on a display panel of a user interface. In oneembodiment, only the most probable word or phrase is displayed to theuser. Alternatively, the predictions and associated probabilityestimates 15 can be passed to another system which may use them as anevidence source for a second predictor which makes further predictionson the basis of multiple evidence sources.

Example Use of the System of the Present Invention

The system 10 requires a description of the keyboard layout being usedin order to convert the raw input samples 11 to the feature stream 12.For the examples that follow, the following configuration is used:

Key X Y a 24.0 120.0 b 264.0 200.0 c 168.0 200.0 d 120.0 120.0 e 168.0120.0 f 168.0 120.0 g 216.0 120.0 h 264.0 120.0 i 312.0 120.0 j 312.0120.0 k 360.0 120.0 l 408.0 120.0 m 360.0 200.0 n 312.0 200.0 o 408.040.0 p 456.0 40.0 q 24.0 40.0 r 168.0 40.0 s 72.0 120.0 t 216.0 40.0 u312.0 40.0 v 216.0 200.0 w 72.0 200.0 x 120.0 200.0 y 264.0 40.0 z 72.0200.0

An example use of the system 10 for a user intending to enter a prefixfor a word and for a user intending to enter multiple words is describedbelow in order to demonstrate the flexibility of the method and systemfor generating predictions from gesture input.

Prefix Matching

The system 10 is required to output the desired first one or moreletters making up the prefix with sufficiently high probability, toenable a wider prediction system to use context and other sources toproduce more accurate full word predictions from it. Alternatively, thesystem 10 can output a prediction 15 for display to a user on the basisof the identified prefix, by displaying the most probable word(s) thatthe prefix represents, without passing the prefix to a larger system.

As stated previously, the sampling means 1 of a system 10 samples thegesture as it is being performed to generate a sequence of samples 11,where the data in each sample consists of the position vector, apressure reading and a time value.

In the example, as shown in FIG. 13, the user has begun to enter theword “type” by moving their finger across the target “keys” of thekeyboard, but the system 10 has only sampled a few samples 11 so far.The data below was taken from a device in which the sampling means 1samples at ˜60 Hz and time is measured in milliseconds.

x y p t 218.14 40.04 0.46 0 218.52 40.93 0.47 13 218.52 40.93 0.47 22218.53 40.93 0.48 34 220.92 50.12 0.49 46 222.69 55.78 0.50 57 224.8362.08 0.50 71 227.11 70.64 0.50 81

The feature identification 2 means of the system 10 first converts thesample sequence 11 to a sequence of features 12, e.g. by using a featureidentification algorithm as described above. In the preferredembodiment, the tuning parameters (the threshold distance and thehysteresis) for retaining features are a function of the minimumseparation distance between keys. In the layout provided above, theminimum separation is 48 pixels. The following parameters are thereforeused:

h=0.5*48=24

h=1.5*48=72

Where h is used as hysterisis and t as the distance threshold. Thefeature identification algorithm yields a set of features for the inputsamples 11 as follows.

x y Features Identified, {c, x, y, s, t} 218.14 40.04 None - no previousvalues 218.52 40.93 {“t”, 218, 40, 2.14, 0}, {“r”, 218, 40, 50.14, 0}218.52 40.93 None 218.53 40.93 None 220.92 50.12 None 222.69 55.78 {“y”,221, 53, 44.14, 13.6} 224.83 62.08 None 227.11 70.64 None 229.00 83.86{“h”, 229, 83, 50.31, 45.3}, {“g”, 229, 83, 38.41, 45.3}

The features 12 above consist of a target character, c, the coordinatesx and y, a separation distance s and a time value, t. It can be seenthat for the eight raw samples 11 provided in this example, a sequenceof five features 12 is generated.

The next step is to feed the sequence of features 12 to the graphgenerating algorithm 3 to construct a graph 13 representing the possiblecombinations of features from the sequence of features 12. The possiblecombinations are cross-referenced with a prefix tree 3 c representingthe language dictionary being used. For this example, we will assume anEnglish dictionary, thus a graph 13 is created containing valid prefixesof English words only, as shown in FIG. 13.

A cost function 3 a of the graph generating means 3 can be configured toassign a cost to the nodes of the graph 13 which represent the features12. To generate costs associated with the various combinations, adistance based cost function is used for this example. Thedistance-based cost function requires some parameters for the decay ofthe Gaussian and exponential functions it uses to assign a cost to thedifference in curve length and distance. In the above example, it isassumed that these parameters are again a function of the keyseparation:

δ=0.7*42.0=29.4

λ=0.4*42=16.8

Where δ is the decay parameter for the feature's distance from the keycentroid, d, and λ is the decay parameter for the difference in curvelength between features versus the ideal shortest length, l. Thus, thecost values in FIG. 13 are calculated:

$c = {\frac{d^{2}}{2\delta^{2}} + \frac{l}{\lambda}}$

A path finding means 4 is configured to identify one or more paths withthe lowest cumulative cost, by using a path finding algorithm or byordering cumulative costs (in the embodiment where cumulative costs arestored at target nodes). A predictor 5 is configured to determine one ormore probability estimates associated with the one or more paths 14(from their cost functions) to provide one or more predictions 15, eachof which comprises a prefix or word/phrase with an associatedprobability estimate.

The cumulative costs and probability estimates for the above example areshown below. In this example, the path finding means would haveidentified “ty” as the least costly path. The predictor 5 of the system10 is configured to determine a probability estimate for each path,where p(C)=e^(−c). The predictor 5 can be configured to output the oneor more most probable prefixes as predictions 15.

C c p(C) “ty” 1.6291 0.1961 “ry” 4.5319 0.0108 “th” 5.3869 0.0046 “rh”8.2893 0.0003

This information can then be used alongside other evidence sources toproduce a full-word prediction. Alternatively, this information can beused alone to predict the word that the user intended to enter via thegesture by returning the most probable full word given the prefix, e.g.in this example the most probable word with the prefix ‘ty’, or thisinformation may also be used alongside a mechanism for improvingpredictions based upon a user's response to those predictions, as may befound in many known devices utilising touch-sensitive keyboards.

Multiple Word Matching

In this example, a user is trying to enter a phrase consisting of twowords: “a wet” into a system 10, as shown in FIG. 14. The first stagesof the process are exactly the same as for the prefix match example.

FIG. 14 shows the input stroke pattern and the table below is the rawdata recorded for this stroke. This example stroke produces thefollowing input stream samples 11, generated by the input sampling means1.

x y p t 17.8 120 0.4 0 19.7 118.4 0.4 74 20.6 117.6 0.4 88 24.9 109.80.4 103 35.2 95.7 0.6 118 41.3 86.3 0.6 132 47.4 78.5 0.6 147 48.8 75.40.6 162 50.7 69.9 0.6 177 52.6 65.2 0.6 192 53 63.7 0.6 207 53.5 62.10.6 221 54 59.8 0.6 236 54.9 55.8 0.6 251 57.7 46.5 0.6 266 62.9 30.80.6 281 64.3 29.3 0.6 297 64.8 28.5 0.6 310 65.2 27.7 0.6 340 65.7 27.70.6 355 66.2 27.7 0.6 385 66.2 28.5 0.7 400 66.6 28.5 0.7 429 67.1 28.50.7 444 69.4 30 0.7 459 78.4 33.2 0.7 474 100.4 36.3 0.7 489 116.4 37.90.7 503 132.3 39.4 0.7 518 149.7 38.6 0.7 533 167 39.4 0.7 549 180.639.4 0.7 564 192.8 39.4 0.7 578 201.3 39.4 0.7 593 203.6 39.4 0.7 608206 38.6 0.7 623 206.9 38.6 0.7 638 207.4 38.6 0.7 653 207.9 37.9 0.7668 208.3 37.9 0.7 682 208.8 37.1 0.7 697 209.7 36.3 0.7 712 210.2 36.30.7 728 211.6 36.3 0.7 742 213.5 36.3 0.7 757 215.8 36.3 0.5 772 222.438.6 0.3 787

The feature identification means 2 converts raw samples 11 into features12, for example using a feature identification algorithm, to provide thefollowing:

Target Curve Character x y Separation Distance A 20.6 117.6 4.2 3.7 S35.2 95.7 44.1 30.1 Q 56.7 49.8 34.1 81.4 W 74.9 31.9 8.6 116.5 E 120.238.3 1.8 162.4 R 168.0 39.4 0.6 210.3 T 217.1 36.8 3.4 260.7 Y 222.438.6 41.6 226.3

As before, a graph 13 is constructed from the features 12 by a graphgenerating means 3 (e.g. a graph generating algorithm). The graphgenerated by a graph generating means 3 in the present example isillustrated in FIG. 15. The fact that “a” is a word by itself and istherefore a terminal node on the prefix tree will cause the graphgenerating algorithm to add a space node to the graph with anappropriate cost (i.e. a penalty), as explained above with reference toFIG. 8.

FIG. 15 illustrates where the construction algorithm inserts the spacenode in the graph. At that point, the search in the prefix tree resets,as explained previously, and any valid word from the root could beconstructed from the remaining feature sequence. With this structure inplace, the process for producing predictions is exactly the same aspreviously demonstrated. Thus, a path finding means 4 is configured toidentify the least costly paths 14 through the graph 13 and a predictor5 is configured to calculate a probability estimate for these paths 14and to output one or more predictions 15. The leaves at the end of thegraph represent the whole character sequence from the root, which nowincludes at least one space. Thus, as demonstrated, phrases comprisingmultiple words and space characters can be predicted by the presentsystem and method.

As described previously, one or more nodes could have been inserted intothe graph, had they been present in the reference dictionary prefix treebut not identified as a feature by the feature identification algorithm.

Furthermore, as explained previously, in the case where the userexplicitly identifies word boundaries via the input gesture, e.g. bygesturing across or close to a space bar, the feature identificationmeans identifies a feature associated with that term boundary, and thegraph construction algorithm can be configured to assign a cost to thespace node based on the cost function implementations discussed above.

As demonstrated by way of the examples, the present invention provides ameans of generating prefix, word and phrase predictions from a singlecontinuous gesture across a gesture-sensitive keyboard. Thus, thepresent method and system of generating text predictions andprobabilities offers a significant advancement over known systems andmethods. As will be understood from the preceding description, thepresent method and system allows subsequent information to influenceearlier corrections. This is especially true when predicting a phrasewhere the second word entered can be used to help infer the first word.The system and method therefore provides increased accuracy of word andphrase predictions over the known systems and devices which merely matcha gesture (to indicate a complete single word) to a word in adictionary.

It will be appreciated that this description is by way of example only;alterations and modifications may be made to the described embodimentwithout departing from the scope of the invention as defined in theclaims.

1. A system comprising: a processor; memory storing instructions that,when executed by the processor, cause the system to perform operationscomprising: sample at a plurality of times a location of a singlecontinuous gesture on a gesture-sensitive keyboard as the gesture isbeing performed; generate one or more features from the plurality ofsamples, wherein the one or more features relates to one of a pluralityof targets of the gesture sensitive keyboard indicative of an userintended input based on the gesture; generate a prefix tree of termswhich includes the one or more features; select one or more pathsthrough the prefix tree of terms given the one or more features based ona distance between the feature and the one of the plurality of targets;and predict one or more terms from the one or more features.
 2. Thesystem of claim 1, wherein the instructions further cause the system togenerate a graph by graph theory, wherein the graph represents theprefix tree of terms.
 3. The system of claim 1, wherein the instructionsfurther cause the system to generate said one or more features from theplurality of samples by identifying a location of the gesture on thegesture-sensitive keyboard indicative of a gesture intended to passthrough a target of the gesture-sensitive keyboard.
 4. The system ofclaim 1, wherein the one of the plurality of targets is a point targetor a line target.
 5. The system of claim 1, wherein the instructionsfurther cause the system to identify a feature for plurality of targets.6. The system of claim 1, wherein the instructions further cause thesystem to retain a feature only if the minimum distance between thefeature and the one of the plurality of targets is below a thresholddistance.
 7. The system of claim 1, wherein the prefix tree of terms isgenerated by retaining the terms of a dictionary prefix tree which areallowed given the one or more features.
 8. The system according to claim7, wherein a term of the dictionary prefix tree is retained when afeature does not correspond to the term.
 9. The system of claim 1,wherein the plurality of targets correspond to letters of the alphabet,and wherein: the prefix tree of terms comprises one or more nodesrepresenting a last letter of a completed word; and the instructionsfurther cause the system to insert a node corresponding to a wordboundary delimiter into the prefix tree where there is a nodecorresponding to the last letter in the completed word.
 10. The systemof claim 9, wherein the instructions further cause the system to reducea probability associated with a node corresponding to the word boundarydelimiter, if a feature associated with that word boundary delimiter hasnot been identified.
 11. The system of claim 10, wherein a new prefixtree of terms is generated at the node corresponding to the wordboundary delimiter by retaining the terms of a dictionary prefix treewhich are allowed given the remaining features in a sequence of one ormore features.
 12. The system of claim 11, wherein the instructionsfurther cause the system to associate metadata with the noderepresenting a word boundary delimiter to prune the new prefix tree ofterms on the basis of context data.
 13. The system of claim 12, whereinthe plurality of targets correspond to letters of the alphabet or wordboundary delimiters, wherein the instructions further cause the systemto allow a given feature to represent a repeated instance of thecharacter it relates to, by retaining the terms of a dictionary prefixtree which include character repetition if there is a valid path giventhe one or more features through the prefix tree for the repeatedcharacter.
 14. The system of claim 1, wherein the instructions furthercause the system to prune the prefix tree of terms to remove pathsthrough a graph for which a probability of the path is below apredetermined threshold.
 15. The system of claim 1, wherein each featurecomprises a distance metric which corresponds to the minimum distancebetween the gesture and the one of the plurality of targets and whereinthe instructions further cause the system to use the distance metrics togenerate a probability estimate associated with each path through theprefix tree of terms.
 16. The system of claim 15, wherein theinstructions further cause the system to return as the one or moreterms, terms for which a corresponding route has a probability estimateabove a threshold value.
 17. The system of claim 1, wherein theinstructions further cause the system to: determine one or more featureswhich correspond to an end location of the gesture; and assign anindication of a cumulative probability for a given path to any noderepresenting the one or more features that correspond to the location ofan end of the gesture, only if that node corresponds to a leaf of theprefix tree of terms.
 18. The system of claim 1, wherein the one or moreterms is predicted on the basis of currently available samples.
 19. Thesystem of claim 1, wherein the instructions further cause the system toperiodically update the prediction for the one or more terms as thesingle continuous gesture is being performed and more samples aregenerated.
 20. The system of claim 1, wherein the instructions furthercause the system to predict one or more words on the basis of a singlecontinuous gesture which corresponds to the user gesturing over one ormore characters on a gesture-sensitive keyboard intended to indicate theprefix for that one or more word.
 21. The system of claim 1, wherein theinstructions further cause the system to predict a phrase comprising asequence of two or more words on the basis of a single continuousgesture which corresponds to gesturing over characters for multiplewords on a gesture-sensitive keyboard.
 22. The system of claim 1,wherein the instructions further cause the system to use contextinformation to tailor the prediction of the one or more terms.
 23. Thesystem of claim 1, wherein the instructions further cause the system topredict the one or more terms on the basis of a topography of thegesture-sensitive keyboard in combination with at least one of gesturevelocity or gesture curve direction.
 24. The system of claim 1, whereinthe instructions further cause the system to predict a path through theprefix tree dependent on a topography of the gesture between twofeatures and targets of the keyboard associated with those two features.25. The system of claim 24, wherein a probability of the path is basedon a monotonically decreasing function of a difference between astraight-line distance between the two targets and a curved length ofthe gesture between the two targets.
 26. The system of claim 24, whereina probability of the path is based on a monotonically decreasingfunction of a difference between a direction of a straight-line betweenthe two targets and a gesture direction at each point between the twotargets.
 27. A method for predicting one or more terms from a singlecontinuous gesture across a gesture-sensitive keyboard comprising:sampling at a plurality of times a location of the gesture on thegesture-sensitive keyboard as the gesture is performed; and predictingone or more terms from the plurality of samples by: generating one ormore features from the plurality of samples, wherein the one or morefeatures relates to a target on the gesture-sensitive keyboardindicative of an input based on the gesture; generating a prefix tree ofterms which includes the one or more features; and selecting one or morepaths through the prefix tree of terms given the one or more featuresbased on a distance between the feature and the target.
 28. A userinterface comprising: a gesture-sensitive keyboard configured to accepttext entered by a user; and sample a location of a single continuousgesture on the gesture-sensitive keyboard as the gesture is beingperformed; wherein the gesture-sensitive keyboard does not include a keycorresponding to a word boundary delimiter.